Using neighborhood information for automated categorization of Web pages
نویسنده
چکیده
In this paper we discuss several issues related to the influence of expansion of a Web document representation on quality of topical categorization of Web pages. We consider a Web page expansion by using text content of it’s linking pages. We show that naive expansion can grab too much noise and essentially harm categorization results. We present the approach to automated pruning of linking Web pages. We report that using our approach in forming a Web page representation always leads to better results than traditional single Web page categorization.
منابع مشابه
A Technique for Improving Web Mining using Enhanced Genetic Algorithm
World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...
متن کاملAnalyzing new features of infected web content in detection of malicious web pages
Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...
متن کاملبررسی ارتباط بین کیفیت اطلاعات و شاخص های ظاهری در صفحات وب فارسی مرتبط با حوزه سلامت عمومی
Introduction: One approach to evaluate the quality of a web page is to investigate its external markers. The purpose of the present study is to determine the relationship between information quality of Persian public health web pages and their external quality. Methods: The samples of this correlation study were selected from among the freely available ten-key word texts of chronic diseases...
متن کاملOn The Automated Classification of Web Pages Using Artificial Neural Network
The World Wide Web is growing at an uncontrollable rate. Hundreds of thousands of web sites appear every day, with the added challenge of keeping the web directories up-to-date. Further, the uncontrolled nature of web presents difficulties for Web page classification. As the number of Internet users is growing, so is the need for classification of web pages with greater precision in order to pr...
متن کامل